48 research outputs found

    Cache Conscious Data Layouting for In-Memory Databases

    Get PDF
    Many applications with manually implemented data management exhibit a data storage pattern in which semantically related data items are stored closer in memory than unrelated data items. The strong sematic relationship between these data items commonly induces contemporary accesses to them. This is called the principle of data locality and has been recognized by hardware vendors. It is commonly exploited to improve the performance of hardware. General Purpose Database Management Systems (DBMSs), whose main goal is to simplify optimal data storage and processing, generally fall short of this claim because the usage pattern of the stored data cannot be anticipated when designing the system. The current interest in column oriented databases indicates that one strategy does not fit all applications. A DBMS that automatically adapts it’s storage strategy to the workload of the database promises a significant performance increase by maximizing the benefit of hardware optimizations that are based on the principle of data locality. This thesis gives an overview of optimizations that are based on the principle of data locality and the effect they have on the data access performance of applications. Based on the findings, a model is introduced that allows an estimation of the costs of data accesses based on the arrangement of the data in the main memory. This model is evaluated through a series of experiments and incorporated into an automatic layouting component for a DBMS. This layouting component allows the calculation of an analytically optimal storage layout. The performance benefits brought by this component are evaluated in an application benchmark

    Efficient Cross-Device Query Processing

    Get PDF
    The increasing diversity of hardware within a single system promises large performance gains but also poses a challenge for data management systems. Strategies for the efficient use of hardware with large performance differences are still lacking. For example, existing research on GPU supported data management largely handles the GPU in isolation from the system’s CPU — The GPU is considered the central processor and the CPU used only to mitigate the GPU’s weaknesses where necessary. To make efficient use of all available devices, we developed a processing strategy that lets unequal devices like GPU and CPU combine their strengths rather than work in isolation. To this end, we decompose relational data into individual bits and place the resulting partitions on the appropriate devices. Operations are processed in phases, each phase executed on one device. This way, we achieve significant performance gains and good load distribution among the available devices in a limited real-life use case. To grow this idea into a generic system, we identify challenges as well as potential hardware configurations and applications that can benefit from this approach

    Accelerating Foreign-Key Joins using Asymmetric Memory Channels

    Get PDF
    Indexed Foreign-Key Joins expose a very asymmetric access pattern: the Foreign-Key Index is sequentially scanned whilst the Primary-Key table is target of many quasi-random lookups which is the dominant cost factor. To reduce the costs of the random lookups the fact-table can be (re-) partitioned at runtime to increase access locality on the dimension table, and thus limit the random memory access to inside the CPU's cache. However, this is very hard to optimize and the performance impact on recent architectures is limited because the partitioning costs consume most of the achievable join improvement. GPGPUs on the other hand have an architecture that is well suited for this operation: a relatively slow connection to the large system memory and a very fast connection to the smaller internal device memory. We show how to accelerate Foreign-Key Joins by executing the random table lookups on the GPU's VRAM while sequentially streaming the Foreign- Key-Index through the PCI-E Bus. We also experimentally study the memory access costs on GPU and CPU to provide estimations of the benefit of this technique

    X-Device Query Processing by Bitwise Distribution

    Get PDF
    The diversity of hardware components within a single system calls for strategies for efficient cross-device data processing. For exam- ple, existing approaches to CPU/GPU co-processing distribute individual relational operators to the “most appropriate” device. While pleasantly simple, this strategy has a number of problems: it may leave the “inappropriate” devices idle while overloading the “appropriate” device and putting a high pressure on the PCI bus. To address these issues we distribute data among the devices by par- tially decomposing relations at the granularity of individual bits. Each of the resulting bit-partitions is stored and processed on one of the available devices. Using this strategy, we implemented a processor for spatial range queries that makes efficient use of all available devices. The performance gains achieved indicate that bitwise distribution makes a good cross-device processing strategy

    Scalable Generation of Synthetic GPS Traces with Real-life Data Characteristics

    Get PDF
    Database benchmarking is most valuable if real-life data and workloads are available. However, real-life data (and workloads) are often not publicly available due to IPR constraints or privacy concerns. And even if available, they are often limited regarding scalability and variability of data characteristics. On the oth

    Instant-on scientific data warehouses: Lazy ETL for data-intensive research

    Get PDF
    In the dawning era of data intensive research, scientific discovery deploys data analysis techniques similar to those that drive business intelligence. Similar to classical Extract, Transform and Load (ETL) processes, data is loaded entirely from external data sources (repositories) into a scientific data warehouse before it can be analyzed. This process is both, time and resource intensive and may not be entirely necessary if only a subset of the data is of interest to a particular user. To overcome this problem, we propose a novel technique to lower the costs for data loading: Lazy ETL. Data is extracted and loaded transparently on-the-fly only for the required data items. Extensive experiments demonstrate the significant reduction of the time from source data availability to query answer compared to state-of-the-art solutions. In addition to reducing the costs for bootstrapping a scientific data warehouse, our approach also reduces the costs for loading new incoming data

    Wildfire monitoring via the integration of remote sensing with innovative information technologies

    Get PDF
    In the Institute for Space Applications and Remote Sensing of the National Observatory of Athens (ISARS/NOA) volumes of Earth Observation images of different spectral and spatial resolutions are being processed on a systematic basis to derive thematic products that cover a wide spectrum of applications during and after wildfire crisis, from fire detection and fire-front propagation monitoring, to damage assessment in the inflicted areas. The processed satellite imagery is combined with auxiliary geo-information layers, including land use/land cover, administrative boundaries, road and rail network, points of interest, and meteorological data to generate and validate added-value fire-related products. The service portfolio has become available to institutional End Users with a mandate to act on natural disasters and that have activated Emergency Support Services at a European level in the framework of the operational GMES projects SAFER and LinkER. Towards the goal of delivering integrated services for fire monitoring and management, ISARS/NOA employs observational capacities which include the operation of MSG/SEVIRI and NOAA/AVHRR receiving stations, NOA's in-situ monitoring networks for capturing meteorological parameters to generate weather forecasts, and datasets originating from the European Space Agency and third party satellite operators. The qualified operational activity of ISARS/NOA in the domain of wildfires management is highly enhanced by the integration of state-of-the-art Information Technologies that have become available in the framework of the TELEIOS (EC/ICT) project. TELEIOS aims at the development of fully automatic processing chains reliant on a) the effective storing and management of the large amount of EO and GIS data, b) the post-processing refinement of the fire products using semantics, and c) the creation of thematic maps and added-value services. The first objective is achieved with the use of advanced Array Database technologies, such as MonetDB, to enable efficiency in accessing large archives of image data and metadata in a fully transparent way, without worrying for their format, size, and location, as well as efficiency in processing such data using state-of-the-art implementations of image processing algorithms expressed in a high-level Scientific Query Language (SciQL). The product refinement is realized through the application of update operations that incorporate human evidence and human logic, with semantic content extracted from thematic information coming from auxiliary geo-information layers and sources, for reducing considerably the number of false alarms in fire detection, and improving the credibility of the burnt area assessment. The third objective is approached via the combination of the derived fire-products with Linked Geospatial Data, structured accordingly and freely available in the web, using Semantic Web technologies. These technologies are built on top of a robust and modular computational environment, to facilitate several wildfire applications to run efficiently, such as real-time fire detection, fire-front propagation monitoring, rapid burnt area mapping, after crisis detailed burnt scar mapping, and time series analysis of burnt areas. The approach adopted allows ISARS/NOA to routinely serve requests from the end-user community, irrespective of the area of interest and its extent, the observation time period, or the data volume involved, granting the opportunity to combine innovative IT solutions with remote sensing techniques and

    Operational Wildfire Monitoring and Disaster Management Support Using State-of-the-art EO and Information Technologies

    Get PDF
    Fires have been one of the main driving forces in the evolution of plants and ecosystems, determining the current structure and composition of the Landscapes. However, significant alterations in the fire regime have occurred in the recent decades, primarily as a result of socioeconomic changes, increasing dramatically the catastrophic impacts of wildfires as it is reflected in the increase during the 20th century of both, number of fires and the annual area burnt. Therefore, the establishment of a permanent robust fire monitoring system is of paramount importance to implement an effective environmental management policy. Such an integrated system has been developed in the Institute for Space Applications and Remote Sensing of the National Observatory of Athens (ISARS/NOA). Volumes of Earth Observation images of different spectral and spatial resolutions are being processed on a systematic basis to derive thematic products that cover a wide spectrum of applications during and after wildfire crisis, from fire detection and fire-front propagation monitoring, to damage assessment in the inflicted areas. The processed satellite imagery is combined with auxiliary geo-information layers and meteorological data to generate and validate added-value fire-related products. The service portfolio has become available to institutional End Users with a mandate to act on natural disasters in the framework of the operational GMES projects SAFER and LinkER addressing fire emergency response and emergency support needs for the entire European Union. Towards the goal of delivering integrated services for fire monitoring and management, ISARS/NOA employs observational capacities which include the operation of MSG/SEVIRI and NOAA/AVHRR receiving stations, NOA’s in-situ monitoring networks for capturing meteorological parameters to generate weather forecasts, and datasets originating from the European Space Agency and third party satellite operators. The qualified operational activity of ISARS/NOA in the domain of wildfires management is highly enhanced by the integra

    Real-Time Wildfire Monitoring Using Scientific Database and Linked Data Technologies

    Get PDF
    We present a real-time wildfire monitoring service that exploits satellite images and linked geospatial data to detect hotspots and monitor the evolution of fire fronts. The service makes heavy use of scientific database technologies (array databases, SciQL, data vaults) and linked data technologies (ontologies, linked geospatial data, stSPARQL) and is implemented on top of MonetDB and Strabon. The service is now operational at the National Observatory of Athens and has been used during the previous summer by emergency managers monitoring wildfires in Greece

    Building Virtual Earth Observatories using Ontologies and Linked Geospatial Data

    Get PDF
    TELEIOS is a European project that addresses the need for scalable access to petabytes of Earth Observation data and the discovery of knowledge that can be used in applications. To achieve this, TELEIOS builds on scientific database technologies (array databases, SciQL, data vaults), Semantic Web technologies (stRDF and stSPARQL) and linked geospatial data. In this technical communication we outline the TELEIOS advancements to the state of the art and give an overview of its technical contributions up to today
    corecore